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We consider the problem of detecting the source of a rumor (information 
diffusion) in a network based on observations about which set of nodes posses 
the rumor. In a recent work [10] by the authors, this question was introduced 
and studied. The authors proposed rumor centrality as an estimator for de- 
tecting the source. They establish it to be the maximum likelihood estimator 
with respect to the popular Susceptible Infected (SI) model with exponen- 
tial spreading time for regular trees. They showed that as the size of infected 
graph increases, for a line (2-regular tree) graph, the probability of source 
detection goes to while for d-regular trees with d > 3 the probability of 
detection, say ad, remains bounded away from and is less than 1/2. Their 
results, however stop short of providing insights for the heterogeneous setting 
such as irregular trees or the SI model with non-exponential spreading times. 

This paper overcomes this limitation and establishes the effectiveness of 
rumor centrality for source detection for generic random trees and the SI 
model with a generic spreading time distribution. The key result is an inter- 
esting connection between a multi-type continuous time branching processes 
(an equivalent representation of a generalized Polya's urn, cf. [1]) and the 
effectiveness of rumor centrality. Through this, it is possible to quantify the 
detection probability precisely. As a consequence, we recover all the results 
of [10] as a special case and more importantly, we obtain a variety of results 
establishing the universality of rumor centrality in the context of tree-like 
graphs and the SI model with a generic spreading time distribution. 

1. Introduction. Imagine a rumor spreads through a network, and after a cer- 
tain amount of time, we only know who has heard the rumor and the underlying 
network structure. With only this information, is it possible to determine who was 
the rumor source? Finding rumor sources is a very general type of problem which 
arises in many different contexts. For example, the rumor could be a computer 
virus on the Internet, a contagious disease in a human population, or some trend in 
a social network. In each of these scenarios, detection of the source is of interest as 
this source may be a malicious agent, patient zero, or an influential person. 

There is limited prior work on the question of finding the source of a rumor. 
However, there has been much work done to understand conditions under which 
a rumor becomes an epidemic and how to use these insights to stop the spread 
[7], [9], [4]. In a thematically related problem of reconstruction, the interest is in 
predicting the state of the source based on the noisy observations about this state 
that are available in the network. Like the reconstruction problem, the source de- 
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tection problem is quite complex: the signal of interest, the state of source for 
reconstruction problem and the rumor source here is extremely 'low-dimensional' 
while the observations, the noisy versions of the state in reconstruction problem 
and infected nodes here, lie in a very 'high-dimensional' setting. This makes the 
estimation or detection quite challenging ^ It is not surprising that even to obtain 
meaningful answers for the reconstruction problem for tree or tree-like graphs, so- 
phisticated techniques have been required, cf.[3],[8], [5]. 

There are two key challenges that need to be addressed to resolve the rumor 
source detection problem. First, how does one actually construct the rumor source 
estimator? The estimator would naturally need to incorporate the topology of the 
underlying network, but it is not obvious in what manner. Second, what are the 
fundamental limits to this rumor source detection problem? In particular, how well 
can one find the rumor source, what is the distribution of the error, and how does 
the network structure affect one's ability to find the rumor source? 

In a recent work [10], the authors introduced and studied the problem of rumor 
source detection in networks. They proposed rumor centrality, a graph-score func- 
tion, for ranking the importance of nodes as the source. They showed that the node 
with maximal rumor centrality is the maximum likelihood (ML) estimation in the 
context of regular trees and the SI model with homogeneous exponential spread- 
ing times. They showed the effectiveness of this estimator by establishing that the 
rumor source is found with non-trivial probability for regular trees and geometric 
trees under this setting. The model and precise results from [10] are described in 
Section 2. 

The key limitations of this prior work are : (i) the results do not quantify the exact 
detection probability, say a^, for d-regular graphs, under the proposed maximum 
likelihood estimator other than 02 = 0, 03 = 0.25 and < < 0.5 for d > 4 
for the SI model with exponential spreading times; (ii) the results do not quantify 
the magnitude of the error in the event of not being able to identify the source; and 
more generally, (iii) the results do not provide any insights into how the estimator 
behaves for generic heterogeneous tree (or tree-like) graphs under the SI model 
with a generic spreading time distribution. 

1.1. Summary of results. The primary reason behind the limitations of the re- 
sults in [10] is the fact that the analytic method employed there is quite specific to 
regular trees with homogeneous exponential spreading times. To overcome these 
limitations, as the main contribution of this work we introduce a novel analysis 
method that utilizes connections to the classical multi-class Markov branching pro- 
cess (MCMBP) (equivalently, generalized Polya's urn (GPU)). As a consequence 
of this, we are able to quantify the probability of the error event precisely and thus 
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eliminate the shortcomings of the prior work. The following is a summary of the 
key results (see Section 3 for precise statements): 

1. Regular tree, SI model with exponential spreading time: we characterize a^, 
the detection probability for d-regular trees, for all d. Specifically, for d > 3 



In above Ix{a, /3) is the incomplete beta function with parameters a, /3 eval- 
uated at X G [0, 1] (see (3.1)). This implies that > for d > 3 with 
as = 0.25 and — 1 — In 2 as d — oo. Further, we show that the probabil- 
ity of rumor centrality estimating the k^^ infected node as the source decays 
as exp(— G(A;)). The precise results are stated as Theorem 3.1, Corollaries 1 
and 2. 

2. Generic random tree, SI model with exponential spreading time: for generic 
random trees (see Section 3.2 for precise definition) which are expanding, we 
establish that there is strictly positive probability of correct detection using 
rumor centrality. Furthermore, the probability of rumor centrality estimating 
the k*^ infected node as the source decays as The precise results are 
stated as Theorem 3.2 and Theorem 3.3. 

3. Geometric tree, SI model with spreading time with finite moment generating 
function: for any geometric tree (see Section 3.2.2 for precise definition), we 
establish that the probability of correct detection goes to 1 as the number of 
infected nodes increases. The precise result is stated as Theorem 3.4. 

4. Generic random tree, SI model with generic spreading time: for generic ex- 
panding random tree with generic spreading time (see Section 3.2 for defi- 
nition), we establish that the probability of correct source detection remains 
bounded away from 0. The precise result is stated as Theorem 3.2. 

The above results collectively establish that, even though, rumor centrality is an 
ML estimator only for regular tree and the SI model with exponential spreading 
times, it is universally effective with respect to heterogeneity in the tree structure 
and spreading time distributions. It's effectiveness for generic random trees imme- 
diately implies its utility for finding sources in sparse random graphs that are lo- 
cally tree-like. Examples include Erdos-Renyi and random regular graphs. A brief 
discussion to this effect can be found in Section 3.3. 

2. Model, problem statement and rumor centrality. We start by describing 
the model and problem statement followed by a quick recall of the precise results 
from [10]. In the process, we shall recall the definition of rumor centrality and 
source estimation based as introduced in [10]. 
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2. 1. Model. Let Q = (V, S) be a possibly infinite connected graph. Let u G V 
be a rumor source from which a rumor starts spreading at time, say 0. As per the 
classical Susceptible Infected (SI) model the rumor spreads in the graph. Specifi- 
cally, each edge say e = {ui,U2), has a spreading time, say Se, associated with it. 
If node ui gets infected at time ti, then at time ti + Se the infection spreads from 
ui to U2. A node, once becoming infected, remains infected. The spreading times 
associated with edges are independent random variables with identical distribu- 
tion. Let F : M — )• [0, 1] denote the cumulative density function of the spreading 
time distribution. We shall assume that the distribution is non-negative valued, i.e. 
F(0) = and it is non-atomic at 0, i.e. -F(0+) = 0. Since it is a cumulative 
density function, it is non-decreasing and \im.x-^oo F{x) = 1. The simplest, ho- 
mogeneous SI model has exponential spreading times with parameter A > with 
F{x) = 1 — exp(— Ax) for x > 0. In [10], the results were restricted to this homo- 
geneous exponential spreading time setting. In this paper, we shall develop results 
for arbitrary spreading time distributions consistent with the above assumptions. 

Problem statement. Given the above spreading model, we observe the rumor in- 
fected graph G = {V, E) at some time t > 0. We do not know the value of t or 
the realization of the spreading times on edges e G E; we only know the rumor 
infected nodes V C V and edges between them E = V x V H £. The goal is to 
find the rumor source (among V) given G. 

2.2. Rumor centrality: an estimator. To solve this problem, the notion of ru- 
mor centrality was introduced in [10]. Rumor centrality is a 'graph score' function. 
That is, it takes G = {V, E) as input and assigns a positive number or score to each 
of the vertices. Then the estimated source is the one with maximal (ties broken 
uniformly at random) score or rumor centrality. The estimated source is called the 
'rumor center' . We start with the precise description of rumor centrality for a tree^ 
graph G: the rumor centraUty of node u with respect to G = {V, E) is 

\V\\ 



(2.1) R{u,G) 



W.W&V '^ui 



where is the size of the subtree of G that is rooted at w and points away from 
u. For example, in Figure 1, let u be node 1. Then \ V\ = 5; the subtree sizes are 
Tl = 5, = 3, = Tl = Tl = 1 and hence R{1, G) = 8: exactly equal to the 
number of distinct spreading orders starting from 1. In [10], a linear time algorithm 
is described to compute the rumor centrality of all nodes building on the relation 
R{u, G)/R{v, G) = T^/T^ for neighboring nodes u,veV {{u, v) G E). 

The rumor centrality of a given node u G F for a tree given by (2.1) is precisely 
the number of distinct spreading orders that could lead to the rumor infected graph 



^ We shall call an undirected graph a tree, if it is connected and it does not have any cycles. 
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Rumor centrality of node 1 = 8 



Spreading sequences 



{1,3,2,4,5},{1, 2,3,4,5}, 
{1,2,4,3,5},{1, 2,4,5,3}, 
{1,3,2,5,4},{1, 2,3,5,4}, 
{1,2,5,3,4},{1, 2,5,4,3} 



Fig 1. Example of rumor centrality calculation for a 5 node network. The rumor centrality of node 
1 is 8 because there are 8 spreading orders that it can originate, which are shown in the figure. 

G starting from u. This is equivalent to computing the number of linear extensions 
of the partial order imposed by the graph G due to causality constraints of rumor 
spreading. Under the SI model with homogeneous exponential spreading times and 
a regular tree, it turns out that each of the spreading orders is equally likely. There- 
fore, the rumor centrality turns out to be the Maximum Likelihood (ML) estimator 
for the source in this specific setting (cf. [10]). In general, the likelihood of each 
node u G y being the source given G is proportional to the weighted summation of 
the number of distinct spreading orders starting from u, where weight of a spread- 
ing order could depend on the details of the graph structure and spreading time 
distribution of the SI model. Now for a tree graph and SI model with homogeneous 
exponential spreading times, as mentioned above, such a quantity can be computed 
in linear time. But in general, this could be complicated. For example, computing 
the number of linear extensions of a given partial order is known to be #P-complete 
[2]. While there are algorithms for approximately sampling linear extensions given 
a partial order [6], [10] proposed a simpler alternative for general graphs. 

Definition 1 (Rumor Centrality). Given node u e V in graph G = {V, E), 
let T C G denote the breadth-first search tree of u with respect to G. Then, the 
rumor centrality of u with respect to G is obtained by computing it as per (2.1) 
with respect to T. The estimated rumor source is the one with maximal rumor 
centrality (ties broken uniformly at random). 

2.3. Prior results. In [10], the authors established that rumor centrality is the 
maximum-likelihood estimator for the rumor source when the underlying graph Q 
is a regular tree. They studied the effectiveness of this ML estimator for such reg- 
ular trees. Specifically, suppose we observe the n{t) node rumor infected graph G 
after time t, which is a subgraph of Q. Let C^^^) be the event that the source esti- 
mated as per rumor centrality is the feth infected node, and thus C'^(t) corresponds 
to the event of correct detection. The following are key results from [10]: 
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Theorem 2.1 ([10]). Let Q be a d-regular infinite tree with d > 2. Let 

(2.2) «^ = Ihninf Pfe(,)) < limsupP(C;^(,)) = a^. 
Then, 

(2.3) = = 0, = = and < aj^ < a'^ < ^, \/ d > 4. 

3. Main results. We state the main results of this paper. In a nutshell, our 
results concern the characterization of the probability of C^^^^ for any A; > 1 for 
large t when Q is a. generic tree. As a consequence, it provides a characterization 
of the performance for sparse random graphs. 

3.1. Regular tree, SI model with exponential spreading time. We first look at 
rumor source detection on regular trees with degree d>2>, where rumor centrality 
is an exact ML estimator when the spreading times are exponentially distributed. 
Our results will utilize properties of Beta random variables. We recall that the reg- 
ularized incomplete Beta function Ix{a, h) is the probability that a Beta random 
variable with parameters a and b is less than x € [0, 1], 

(3.1) 4.(a,6) = ^;^^ /%"-i(l-t)^-idt, 

T{a)T{h) Jo 

where r(-) is the standard Gamma function. For regular trees of degree > 3 we 
obtain the following result. 

Theorem 3.1. Let Q be d-regular infinite tree with c? > 3. Assume a rumor 
spreads on Q as per the SI model with exponential distribution with rate A. Then, 
for any k > 1, 



lim P( CL> ) =/i/2 ( - 1 + -T^, 1 + 



1 1 



(3.2) +(^-l)(v(^'^+d^2. 
For k = 1, Theorem 3.1 yields that = a][ = ad for all d > 3 where 

(3.3) ad = dh/2 (j^, ^) -id -I). 



More interestingly. 
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Corollary 1. 
(3.4) lim ad = 1 - ln2 « 0.307. 

d—^ca 

For any d > 3, we can obtain a simple upper bound for Theorem 3.1 which pro- 
vides the insight that the probability of error in the estimation decays exponentially 
with error distance (not number of hops in graph, but based on chronological order 
of infection) from the true source. 

Corollary 2. When Q is a d-regular infinite tree, for any k>\, 
^limP(c7^(,)) <A;(fc + l)Qj ~ exp(-e(fc)). 

To provide intuition, we plot the asymptotic error distribution lim(_j.oo P {pn{t)) 
for different degree regular trees in Figure 2. As can be seen, for degrees greater 
than 4, all the error distributions fall on top of each other, and the probability of 
detecting the k^^ infection as the source decays exponentially in k. We also plot the 
upper bound from Corollary 2. As can be seen, this upper bound captures the rate 
of decay of the error probability. Thus we see tight concentration of the error for 
this class of graphs. Figure 3 plots the asymptotic correct detection probability Ud 
versus degree d for these regular trees. It can be seen that the detection probability 
starts at 1/4 for degree 3 and rapidly converges to 1 — ln(2) as the degree goes to 
infinity. 
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Fig 3. ad versus degree dfor regular trees. 



3.2. Generic random tree, SI model with general spreading time distribution. 
The above precise results were obtained using the memoryless property of the ex- 
ponential distribution and the regularity of the trees. Next, we wish to look at a 
more general setting both in terms of tree structures and spreading time distribu- 
tions. In this more general setting, while we cannot obtain precise values for the 
detection and error probabilities, we are able to make statements about the non- 
triviality of the detection probability of rumor centrality. When restricted to ex- 
ponential spreading times for generic trees, we can identify bounds on the error 
probability as well. Let us start by defining what we mean by generic random trees 
through a generative model. 

Definition 2 (Random Trees). A random tree is generated as follows. Start- 
ing with a root node, add a random number of children, say 7]q to the root with 
ijo G {0,1...} having some distribution Dq- If Vo 0, then to each child of 
the root, add a random number of children chosen as per distribution D over 
{0, 1, . . . }. Recursively, to each newly added node, add independently a random 
number of nodes as per distribution D. When children are added to a node, that 
branch of the tree terminates there. 

The generative model described above is precisely the standard Galton-Watson 
branching process if Vq = V. If we take Vq and V to be deterministic distributions 
with support on d and d — l respectively, then it gives the d-regular tree. For a ran- 
dom d-regular graph on n nodes, as n grows the neighborhood of a randomly cho- 
sen node in the graph converges (in distribution, locally) to such a d-regular tree. If 
we take Vq = V as a. Poisson distribution with mean c > 0, then it asymptotically 
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equals (in distribution) to the local neighborhood of a randomly chosen node in 
a sparse Erdos-Renyi graph as the number of nodes grows. Recall that a (sparse) 
Erdos-Renyi graph on n nodes with parameter c is generated by selecting each of 
the (2) edges to be present with probability c/n independently. Effectively, ran- 
dom trees as described above captures the local structure for sparse random graphs 
reasonably well. For that reason, establishing the effectiveness of rumor centrality 
for source detection for such trees provide insights into its effectiveness for sparse 
random graph models. 

We shall consider spreading time distributions to be generic: let F : M — )• [0, 1] 
be the cumulative density function of spreading time distribution. Then F{0) = 
0, F(0+) = and limf^oo F{t) = 1. We state the following result about the 
effectiveness of rumor centrality with a generic spreading time distribution. 

Theorem 3.2. Let rjo, distributed as Dq, be such that Pr(r/o > 3) > and 
let 77, distributed as perD, be such that E[r/^] < 00, £[77] > 1. Suppose the rumor 
starts from the root of the random tree generated as per distributions Dq and T> 
as described above and spreads as per the SI model with generic spreading time 
distribution as discussed above. Then, 

liminfp(c^(,)>0. 

The above result says that irrespective of the structure of the random trees, 
spreading time distribution and time elapsed, there is non-trivial probability of de- 
tecting the root as the source by rumor centrality. The interesting aspect of the 
result is that this non-trivial detection probability is established by studying events 
when the tree grows without bound. In specific models (e.g. Poisson distribution 
of Pq) one may derive such a bound by identifying the probability of having 
an empty tree to be non-trivial. But, indeed, such events are trivial and are not of 
much interest to us (neither mathematically, nor motivationally). 

3.2. 1. Generic random tree, SI model with exponential spreading time distribu- 
tion. Extending the results of Theorem 3.2 for explicitly bounding the probabil- 
ity of error event, P(C'^(t))^ for generic spreading time distribution seems rather 
challenging. Here we provide a result for generic random trees with exponential 
spreading times. 

Theorem 3.3. Consider the setup of Theorem 3.2 with spreading times being 
homogeneous exponential distributions with (unknown, but fixed) parameter A > 
0. In addition, let T>q = D. Let (with some abuse of notation) 7]i denote the number 
of children ofi^^ infected node. Then conditioned on the event that, (i) rjk > % and 
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f") Yli=i Vi > ckrjk for some c > 1, 

<3 5) P(C„',.,) < 

The above result establishes an explicit upper bound on the error event un- 
der the occurrence of a specific event. The bound is relatively weaker (l/k ver- 
sus exp(— 0(/c))) and holds under specific conditions. However, it applies to es- 
sentially any generic random tree and demonstrates that the probability of (mis- 
)estimating later infected nodes decreases. 

3.2.2. Geometric trees. The trees considered thus far, d-regular tree with d > 
3 or random tree with E[r/] > 1, grow exponentially in size with the diameter of 
the tree. This is, in contrast with line graphs or d-regular trees with d = 2 which 
grow only linearly in diameter. It can be easily seen that the probability of correct 
detection, P(C^^^p will scale as G(l/\/t) for line graphs as long as the spreading 
time distribution has non-trivial variance (see [10] for proof of this statement for 
SI model with exponential spreading times). In contrast, the results of this paper 
stated thus far suggest that the expanding trees allow for non-trivial detection as 
t — )• oo. Thus, qualitatively line (tree) graphs and expanding trees are quite dif- 
ferent - one does not allow detection while the other does. To understand where 
the precise detectability threshold lies, here we study the polynomially growing 
geometric trees. 

Definition 3 (Geometric Tree). This family of rooted, non-regular trees are 
parameterized by constants a, b, and c, with < b < c and a root v*. Let d* be 
the degree of this root v* and let the d* neighboring subtree of v*, be denoted by 
Ti, . . . , Td*. Consider the ith subtree Ti, 1 < i < d*. Let v be any node in Ti and 
let n^{v, r) be the number of nodes in Ti at distance exactly r from the node v. 
Then we require that for all 1 < i < d* and v G Ti 

(3.6) br"' <n\v,r) <cr°'. 

The condition imposed by (3.6) states that each of the neighboring subtrees of 
the root should satisfy polynomial growth (with exponent a > 0) and regularity 
properties. The parameter a > characterizes the growth of the subtrees and the 
ratio c/b describes the regularity of the subtrees. If c/b ^ 1 then the subtrees are 
somewhat regular, whereas if the ratio is much greater than 1, there is substantial 
heterogeneity in the subtrees. Note that the line graph is a geometric tree with 
a = 0, 5 = 1, and c = 2. 

We shall consider the scenario where rumor starts from the root node of a rooted 
geometric tree. We shall show that rumor centrality detects the root as the source 
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with asymptotic probability of 1 for a generic spreading time distribution with 
exponential tails. This is quite interesting given the fact that rumor centrality is an 
ML estimator only for regular tree with exponential spreading time. The precise 
result is stated next. 

Theorem 3.4. Let Q be a rooted geometric tree as described above with pa- 
rameters a>0, 0<6<c and root node v* with degree d* such that 

dy* > max ^2, ^ + 1^ . 

Suppose the rumor starts spreading on Q starting from v* as per th e SI model with 
generic spreading time distribution whose cumulative density function F : M — )■ 
[0, 1] is such that (a) F(0) = 0, (b) -^(0"'') = 0, and(c) if X is a random variable 
distributed as per F then E[exp(0X)] < oo for 9 € (— e, £)for some e > 0. Then 

1™P(C1(,)) = 1. 

A similar theorem was proven in [10], but only for the SI model with an ex- 
ponential distribution. We have now extended this result to arbitrarily distributed 
spreading times. Theorem 3.4 says that a = and a > serve as a threshold 
for non-trivial detection: for a = 0, the graph is essentially a linear graph, so we 
would expect the detection probability to go to as t — )• oo as discussed above, but 
for a > the detection probability converges to 1 as t — )• oo. 

3.3. Locally tree-like graphs: discussion. The results of the paper stated are 
primarily for all sorts of tree structured graphs. On one hand, they are specialized. 
On the other hand, they do serve as local approximations for a variety of sparse 
random graph models. As discussed earlier, for a random d-regular graph over m 
nodes, a randomly chosen node's local neighborhood (say up to distance o(log m)) 
is a tree with high probability. Similarly, consider an Erdos-Renyi graph over m 
nodes with each edge being present with probability p = c/m independently for 
any c > (c > 1 is an interesting regime due to existence the of a giant compo- 
nent). Then again, a randomly chosen node's local neighborhood (up to distance 
o(log m)) is a tree and distributionally equivalent (in the large m limit) to a random 
tree with Poisson degree distribution. 

Given such 'locally tree-like' structural properties, if a rumor spreads on a ran- 
dom d-regular graph or sparse Erdos-Renyi graph for time o(log m) starting from 
a random node, then rumor centrality can detect the source with guarantees given 
by Theorems 3.1 and 3.2. Thus, effectively though the results of this paper are for 
tree structured graphs, they do have meaningful implications for tree-like sparse 
graphs. 
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Fig 4. P (C500) versus k for Erdos-Renyi graphs with mean degree 10 and 20. Also shown with a 
black dashed line is limt^oo P (C'n(t)) /'''"' degree 10, 000 random regular graph. 

For the purpose of illustration, we conducted some simulations for Erdos-Renyi 
graphs that are reported in Figure 4. We generated graphs with m = 50, 000 nodes 
and edge probabilities p = c/m for c = 10 and c = 20. The rumor graph con- 
tained n = 500 nodes. We ran 10, 000 rumor spreading simulations to obtain the 
empirical error distributions plotted in Figure 4. As can be seen, the error drops 
of exponentially, very similar to the regular tree error distribution. In fact, we also 
plot the distribution for regular tree of degree 10, 000 and it can be seen that the 
error decays at similar, exponential rates. This indicates that even though there is 
substantial randomness in the graph, the asymptotic rumor source detection error 
distribution behaves as though it were a regular tree graph. This result also suggests 
that the bounds in Theorem 3.3 are loose for this graph. 

4. Proofs. Here proofs of the results stated in Section 3 are presented. We 
establish results for d-regular trees by connecting rumor spreading with Polya urn 
models and branching processes. Later we extend this novel method to establish 
results for generic random trees under arbitrary spreading time distributions. After 
this, we prove Theorem 3.4 using standard Chemoff 's bound and the polynomial 
growth property of geometric trees. 

4.1. Proof of Theorem 3.1: d-regular trees. 

4. 1. 1. Setup, notations. Let Q = (V, 8) be an infinite d-regular tree and let the 
rumor start spreading from a node, say vi . Without loss of generality, we view it as 
a randomly generated tree, as described in Section 3, with vi being the root with 
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d children and all the subsequent nodes with d — 1 children (hence each node has 
degree d). We shall be interested in d > 3. Now suppose the rumor is spread on 
this tree starting from vi as per the SI model with exponential distribution with rate 
A > 0. 

Initially, node vi is the only rumor infected node and its d neighbors are potential 
nodes that can receive the rumor. We will denote the set of nodes that are not yet 
rumor infected but neighbors of rumor infected nodes as the rumor boundary. Thus, 
initially the rumor boundary consists of the d neighbors of vi . Under the SI model, 
each edge has an independent exponential clock of mean 1/ A. The minimum of 
d independent exponentials of mean 1/A is an exponential random variable (of 
mean 1/ (dX)) and hence one of the d nodes (chosen uniformly at random) in 
the rumor boundary gets infected after an exponential amount of time (of mean 
1/ (dA)). Upon this infection, this node gets removed from the boundary but adds 
it's d — 1 children to the rumor boundary. That is, each infection effectively adds 
d — 2 new nodes to the rumor boundary. In summary, let Z{t) denote the number 
of nodes in the rumor boundary at time t, then Z{0) = d and Z{t) evolves as 
follows: each of the Z{t) nodes has an exponential clock of mean 1/A; when it 
ticks, it dies and replaces itself with d — 2 new nodes which in turn start their own 
independent exponential clocks of mean 1/A and so on. The thus described Z{t) 
is precisely the multi-class Markov branching process (MCMBP): the multi-class 
comes when we think of the contributions of each of the d sub-tress of the root vi to 
the rumor boundary separately and hence effectively having d branching processes 
each starting at time with initial value equal to 1: let ui , . . . , be the children of 
vi ; let Zi (t) denote the number of nodes in the rumor boundary that belong to the 
subtree Ti{t) that is rooted at Ui with Zi{0) = 1 for 1 < i < d; Z{t) = X;f=i Zi{t). 
With an abuse of notation, let Tj(t) also denote the total number of nodes infected 
in the subtree rooted at Ui at time t; initially Tj(0) = for 1 < i < d. Since each 
infected node add d — 2 nodes to the rumor boundary, it can be easily checked that 
Zi{t) = {d- 2)Ti{t) + 1 and hence Z{t) = {d - 2)T{t) + d with T{t) being the 
total number of infected nodes at time t (excluding vi, i.e. T(0) = 0). 



4.1.2. Probability of correct detection^ (^C'„(i)j- Suppose we observe the ru- 
mor infected nodes at some time t which we do not know. That is, we observe the 
rumor infected graph G{t) which contains the root vi and its d infected subtrees 
Ti{t) for 1 < i < d. We recall the following result of [10] that characterizes the 
rumor center. 

Lemma I ([10]). Given a rumor infected tree G = {V, E), there can be at 




imsart-aap ver. 2009/12/15 file: Arxivaap.tex date: November 4, 2011 



14 



most two rumor centers. Specifically, a node v £V is a rumor center if and only if 

(4.1) i;''<^( ^;)' VieA^(^), 

where M{v) = {n G : (u, E are neighbors ofv in G and Tj denotes the 
sub-tree ofG that is rooted at node j € M{v) that includes all nodes that are away 
from node v (i.e. the subtree does not include v). The node v is the unique rumor 
center if the inequality in (4.1) is strict for all i G N{v). 

This immediately suggests the characterization of the event that node vi, the 
true source, is identified by rumor centrality at time t: is a rumor center only if 
'^Tiit) < Yl'j=i Tj{t) for all 1 < i < d, and if the inequaUty is strict then it is 
indeed the rumor center. Now if n{t) is the total number of infected nodes at time 
t, then as per our earlier notation, C'^(i) is the event of correct detection at time t. 

Let Ei = {2Ti{t) < Yfj=i Tj{t)} and Fi = {2Ti{t) < Yfj=i Tj{t)}. Then, 
(4.2) 

p(c^(,)) >p(ntiii;.) =i-p(utiSf) ^>i-Y,v{Ei) ^^i-dv^E'^. 

i=l 

Above, (a) follows from the union bound of events and (b) from symmetry. Simi- 
larly, we have 

(4.3) 

=1-P(utii^^) ^^l-jz^{Ff) il-dp(Ff). 

i=l 

Above, (a) follows because events , . . . , are disjoint and (b) from symmetry. 
Therefore, the probability of correct detection boils down to evaluating P(Ff ) and 
P(Ff) which, as we shall see, will coincide with each other as t — oo (equiva- 
lently, n{t) — )■ oo). Therefore, the bounds of (4.2) and (4.3) will provide the exact 
evaluation of the correct detection probability as t — )• cxd. 

4.1.3. P(£'f), P(Ff) and Polya's urn. Effectively, the interest is in the ratio 
Ti{t) / (Y^^^i Ti{t)) especially as t — )• oo (implicitly we are assuming that this ratio 
is well defined for a given t or else by definition there is only one node infected 
which will be vi, the true source). It can be easily verified that as t — oo, Ti{t) — )■ 
oo for all i almost surely and hence Zi{t) = (d — 2)Ti{t) + 1 goes to oo as 
well. Therefore, it is sufficient to study the ratio -^i (i) Zj{t)) as t — oo 
since we shall find that this ratio converges to a random variable with density on 
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[0, 1]. In summary, if we establish that the ratio Zj{t)) converges 

in distribution on [0, 1] with a well defined density, then it immediately follows 
that P{E^) = P(Ff ) as t ^ oo and we can use Zi{t) / {Yl'^j^i Zj{t)) in place of 
T'i(i)/(Ej=i Tj{t)) to evaluate P(£^f) as t ^ oo. 

With these in mind, let us study the ratio Zi{t) j (X]j=i Zj{t)). For this, it is in- 
structive to view the simultaneous evolution of (Zi (t) , Z^i (t)) {Z^i (t) = Y2'j=2 (*)) 
as that induced by the standard, discrete time, Polya's urn: initially, tq = and 
there is one ball of type 1 representing Zi{to) = 1 and d — I balls of type 2 
representing Z^\(tq) = d — 1 in a given urn; the n^^ event happens at time r„ 
when one of the Z\(Tn-\) + Z^\{Tn-\) (= d + (n — — 2)) balls chosen 
uniformly at random is thrown out of the urn and new d — 2 balls of its type are 
added to the urn. If we set r„ — r„_i equal to an exponential random variable with 
mean 1 / (\(d + (n — l)((i — 2))), then it is easy to check that the fraction of balls 
of type 1 is identical in law to that of -^i (i) /(X^j=i Zj{t)) (here we are using the 
memoryless property of exponential random variables crucially). Therefore, for our 
purposes, it is sufficient to study the limit law of fraction of balls of type 1 under 
this Polya's um model. 

It is well understood that the fraction of balls of type 1 at time r„, which is 
equal to Zi(T„)/(Zi(r„) + Z^i(r„)), is a martingale with value in [0, 1]. By the 
standard martingale convergence theorem, it converges to a well defined random 
variable almost surely. Further, the law of this limiting random variable in our 
particular case turns out to be the Beta distribution with parameters a = l/{d — 2) 
and b = {d — l)/{d — 2). (See chapter on generalized Polya's um in [1], for 
example, for proof details of this statement; a more general version of this result 
will be utilized later in the context of general random trees). 

From the above discussion, we conclude that the ratio Zi{t) / (Ym^^ Zi(t)) con- 
verges to a Beta random variable with parameters a = l/{d — 2) and b = {d — 
l)/{d — 2). Since the Beta distribution has density on [0, 1], from the above dis- 
cussion it follows that as t — ;> oo (equivalently, n{t) — ?> oo), P(i?i) = P(i*'f ) and 
hence from (4.2), (4.3) 

(4.4) ,ip^p(c;,„) =l-</(l-/,,,(^,l + ^)), 

where recall that Iii2{a,b) is the probability that a Beta random variable with 
parameters a and b takes value in [0, 1/2]. Note that this establishes the result of 
Theorem 3.1 for A; = 1 in (3.2). 

4. 1.4. Probability ofC^^^y Thus far we have established Theorem 3. 1 for A; = 
1, the probability of the rumor center being the true source. The probability of the 
event C'^(i) (the fcth infected node being the rumor center) is evaluated in an almost 
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identical manner with a minor difference. For this reason, we present an abridged 
version of the proof. 

Let Vk,k > 2 be the k^^ infected node when the rumor starts from vi . We will 
evaluate the probability of identifying as the rumor center. As before, let us sup- 
pose we observe the infected tree G at time t with n{t) > k nodes. Let wi, . . . ,Wd 
be the d neighbors of with respect to Q. Note that one of the neighbors of Vk is 
infected before it. Equivalently, viewing the tree being rooted at vi, this neighbor 
is the 'parent' of Vk- We shall denote it by wi and let W2, ■ ■ ■ ,Wd be the d — 1 
'children' of v^- Let T!^{t) be the subtrees of G rooted at Wi if we imagine Vk as 
the root of G: therefore, T^{t) is rooted at wi and includes vi, . . . , Vk~i', T^{t) are 
rooted a.t wi for 2 < i < d and contain nodes in G that are away from Vk', none 
of the T^{t) for \ < i < d include v^. By definition T^{t) is never empty, but 
Tf (t) can be empty if Wi is not infected, for 2 < i < d. As before, with abuse of 
notation, we shall denote T^{t) as the size of the subtree as well. As per Lemma 
1, ffc is identified as a rumor center if and only if all of its d subtrees are balanced, 
i.e. 

d 

(4.5) 27;^ (t) < ^ r/(t), V 1 < i < d. 

i=i 

Therefore, as before, 

d 

(4.6) p(c^(,)) >p(ntiS,) =i-p(utii5;f)>i- j;p(^f), 

2=1 

and 

d 

(4.7) P < P ( F,) = 1 - P ( Uti Ff ) =1 - ^ P (Ff ) . 

i=l 

Above, Ei = {2T^{t) < ^'(*)} = ^ ^'(*)}- 

To evaluate these probabilities, as before, we shall study the evolution of the 
rumor boundaries in each tree. Unlike the earlier situation where all events had the 
same probability, we need to be a bit careful for k > 2. Specifically, note that the 
time when the kth node, gets infected, the tree T^{-) has size k — 1 and it's rumor 
boundary, Zj^(-), is of size ((i-2)(A;-l) + l.Butfor2 < i < d, is empty and 
has its rumor boundary, Z^{-), of size 1. Beyond this difference in initial values, 
the evolution is the same as before and therefore the limiting laws of the ratios of 
the sizes of rumor boundaries can be evaluated as before as the limit of the fraction 
of balls of a given type in a Polya's urn model with different initial number of balls 
of two types. Specifically, to evaluate (and F^), we consider a Polya's urn in 
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which we start with (d — 2)(A; — 1) + 1 balls of type 1 (corresponding to (•)) and 
d — 1 balls of type 2 (corresponding to J2'j=2 -^j ('))• With these initial conditions, 
the limit law of fraction of balls of type 1 turns out to be (see [1] for details) a Beta 
distribution with parameters a = {{d — 2)(/c — 1) + — 2) = {k — 1) + 

l/{d - 2) and 6 = (d - l)/(d - 2) = 1 + - 2). In summary, the fraction of 
balls of type 1 equals the ratio Z'^^t) / (X]^=i Zj{t)) which is equal to the ratio of 
'^li't) I (Z]j=i (*)) as t — ^ oo, since these quantities go to infinity as t — > oo and 
Z\(t) is linearly related to Tf(i) for 1 < z < d. Since the limit law of the fraction 
of balls of type 1 has density on [0,1], we conclude that as t — oo 

(4.8) p(e;) = p(Ff) = 1 - - 1 + 1 + 

To evaluate El (and F^) for 2 < i < d, in the corresponding Polya's urn model, 
we start with 1 ball of type 1 and k{d — 2) + 1 balls of type 2. Therefore, using an 
identical sequence of arguments, we obtain that for 2 < z < d as f — )■ oo, 

(4.9) p(e<) =p(Ff) =1-V2(^.'= + ^). 
From (4.6)-(4.9), it follows that 

(4.10) +(<'-i)(v(^.'= + ^)-i)- 

This establishes (3.2) for all k and hence completes the proof of Theorem 3.1. 

4.2. Proof of Corollary 1. Simple analysis yields Corollary 1. We start by 
defining the asymptotic probability for a d-regular tree as limi_^oo P (^C'^(t)) ~ 
Ud- This quantity then becomes 



ad =d Ii/2 



1 1 
1 + 



d-2' d-2 

2 



d+1 



r(i + /^) /-111 
r(3^)r(i + 3^)71 
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We then take the limit as d approaches infinity. 

lim ad = lim 1 - d . \ ^ ^ / t^~^(l -t)^dt 



dT{l + 



1 - lim ' "^"^^ ^ ^ / (1 - dt 



= 1 - t^^di 
= 1 - In (2) 

In above, 7 is the Euler-Mascheroni constant and we have used the following ap- 
proximation for T{x) for small x: 

r (x) = - - 7 + o (x) 

X 

4.3. Proof of Corollary 2. Corollary 2 follows from (3.2) and monotonicity of 
the r function over [1, 00) as follows: for k >2, 



nk+dh) 



- r(fc-i)r(i)yo 



k-l 

= exp(^-fcln2 + £(/;;) j = exp(-e(fc)), 

where e{k) = 0(log A;). In above (a) follows from the fact that t < 1 and hence 
^fc-2+i/(d-2) ^ ^^=-2 poj- (b), we need to use property of the T function carefully. 
It is well known that T achieves minimum value in [1, 2], r(l) = r(2) = 1, and 



imsart-aap ver. 2009/12/15 file: Arxivaap.tex date: November 4, 2011 



19 



T{x) increases for x > 2. Now the minimum value of F function is at least 1 / (2e). 
Since d > 3, 1 + - 2) G [1,2]. Therefore, r(l + l/{d- 2)) < r(l)/(2e). 
Similarly, r(fc - 1 + l/{d - 2)) > r(A; - l)/(2e) for all /c > 2 and d > 3. 
Therefore, (b) follows. 

4.4. Proof of Theorem 3.2: correct detection for random trees. The goal is to 
establish that there is a strictly positive probability of detecting the source correctly 
as the rumor center when the rumor starts at the root of a generic randomly gen- 
erated tree: this is with respect to the joint probability distribution induced by the 
tree construction and the SI rumor spreading model with independent spreading 
times with distribution F{t). We establish this result along the lines of the proof 
for Theorem 3.1. However, it requires additional details due to the irregularity and 
randomness in the node degrees in the tree and the arbitrary spreading time distri- 
bution F{t). 

A.A.I. Notations. We quickly recall some notations. To start with, as before 
let vi be the root node of the tree. It has 770 children distributed as per Dq. By 
assumption of Theorem 3.2, P(r/o > 3) > 0. We shall condition on this positive 
probability event that 770 > 3 and let d = rjQ for the remainder of the proof. Let 
ui , . . . , be the d children of vi . The random tree Q is constructed by adding 
a random number of children to ui, . . . ,Ud recursively as per distribution V as 
explained in Section 3.2. The rumor starts at vi at time and spreads as per SI 
model on Q with spreading times whose cumulative density function F : R — )• 
[0, 1] is such that F(0) = 0, F(0+) = and limt^oo F{t) = 1. 

Let G be the sub-tree of Q that is infected at time t, let n{t) be the number of 
nodes in G at time t, and let Tj (t) denote the subtree of G rooted at node Ui at time 
t, for 1 < i < d. We shall abuse notation as before by using Ti{t) as the subtree 
size as well. By definition Tj(0) = for 1 < i < d. Let Zi{t) denote the size of 
the rumor boundary of Tj(t); initially Zj(0) = 1. 

Since we are interested in evaluating the probability of detection with respect to 
the joint distribution over the tree generation and SI spreading model, we model 
the evolution of Zi{-) and Ti{-) as follows. Each node in the rumor boundary has 
its own independent clock with distribution F{t). When the clock of a particular 
node ticks, it dies and it is replaced by rj new nodes where 77 is an independent 
random variable distributed as per V. If the node that died belonged to Zi{-) (i.e. 
tree Ti{-)), then the new nodes are added to Zi{-). Therefore, each Zi{-) is a general 
time branching process with Zj(0) = 1 for alll < « < d. Now we recall some facts 
about branching processes that will be useful for establishing the non-triviality of 
the probability of correct detection for such randomly generated trees. 

First we define what is known as the Malthusian parameter for the branching 
process. 
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Definition 4. [I, pp. 146] The Malthusian parameter for a constant 7 and 
a distribution F is the root, provided it exists, of the equation 

POO 

(4.11) 7 / e-^ydFiy) = 1. 

Jo 

We denote it by a = 0(7, F). 

For any 7 > 1 the Malthusian parameter always exists. The Malthusian parame- 
ter characterizes the growth rate of a general time branching process. Consider for 
example a Markov branching process with exponential spreading times of rate A 
and let E[r/] = m > 1. The spreading time distribution is F{t) = 1 — e~^* Then 
the Malthusian parameter for this process a{m, F) is given by 

/•oo 

m / e-'^yXe^^ydy = 1 
Jo 

A . 

m = 1 

a + X 

A(m — 1) = a. 

This is exactly the growth rate for the mean of a Markov branching process [1]. 
The Malthusian parameter also describes the growth of the mean of general time 
branching processes, as the following theorem shows. 



Theorem 4. 1 . [1, Theorem 3A, pp. 152 ] Let Z{-) be a continuous time branch- 
ing process as described above: Z{0) = 1; each node in Z{t) has an independent 
clock with distribution whose CDF is F as described above, and it dies upon the 
tick of the clock; upon death of a node, it is replaced by r] new nodes chosen inde- 
pendently for each node, and so on.If'Ei[r]]=m>l then as t ^ 00, 

, E\Z(t)] 

lim \ V = 1> 

t— 5-00 ce 

where a is the Malthusian parameter for (m, F) and 

, m — 1 

a — 



This theorem says that the mean of the branching process Z{t) has exponential 
growth with rate given by the Malthusian parameter a{m, F). As an example of 
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this general theorem, consider again the Markov branching process with exponen- 
tial spreading times with rate A. Then we have already seen that the Malthusian 
parameter a = A(E[7/] — 1). The constant d evaluates to 

, m — 1 
c — 

_ {m-l){a + Xf 
Xam? 

_ (m- l)(A(m- 1) + A)^ 
A^(m — l)rin? 

(m — l)\^m? 

X^{m — l)m? 
= 1 

Therefore, we have that for the Markov branching process, E[Z(t)] = e^^^^^^^^\ 
a well known result [1]. For the general time branching process, we have the fol- 
lowing result. 

ThE0REM4.2. [1, Theorem!, pp. 172] Let Z{-) be a continuous time branch- 
ing process as described above: ^(0) = 1; each node in Z{t) has an independent 
clock with distribution whose CDF is given by F and it dies upon the tick of the 
clock; upon death of a node, it is replaced by r] new nodes chosen independently 
for each node, and so on. Let a be the Malthusian parameter for {'E[ri],F) and d 
be defined as in Theorem 4.1.lfY^\r]\ > 1 and E[r7 log rf\ < oo, then 

Z{t)/de'^*' — )■ W, in distribution, 

where W is such that 

(4.12) p(W = 0)=q, 

(4.13) ViW £ {xi,X2)) = I w{y)dy, for < xi < X2 < oo, 

where q E (0, 1) is the smallest root of the equation frjis) = s in [0, 1] and w{-) is 
absolutely continuous with respect to the Lebesgue measure so that w{y)dy = 
1 - q. Here = ^^=0 ^'P(^ = k). 

As we will see, this theorem will be key to proving Theorem 3.2. 
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4.4.2. Correct detection. Recall from the proof for regular trees, the probabil- 
ity of the event of correct detection at time t, is lower bounded as 

J n{ty 

d 

(4.14) P(ci(,)) > P(nti [2T,{t) < Y^m}). 

i=i 

We shall establish a non-trivial lower bound for the right hand side of (4.14) us- 
ing Theorem 4.2. This will be done in the two steps: (i) identify an event £ C 
nf^]^|2Zj(t) < Yl'j=i ^i^d then establish a non-trivial lower bound on S 

using Theorem 4.2; (ii) establish that as t oo,£ C r\f^^^2Ti{t) < Ej=i^i(*)}- 

This will immediately imply that ?(<£") is a non-trivial lower bound on P ^C'^(j)^ ■ 

4.4.3. A non-trivial event. The event 2Zi{t) < Yl'j=i ^ji't) is equivalent to 
v-d 7- /-I -at ^ 1/^' ^i'^'^ Malthusian parameter a and c defined as in 
Theorem 4.1. For any x > 0, define event E{x) as 

(4.15) e{x) = nti|Zi(t)c'~ie-"* G (x, {d - 

Under event E{x), since each Zj(t)c'~^e~"* is at least x and at most (d — it 
follows immediately that 

d d 

(4.16) ^:(x) C |2maxZi(t) < ^Zj(t)} = n^i {2Zi(t) < ^ 

j=i i=i 

Now ^j(-) are independent across 1 < i < d. By Theorem 4.2 it follows that 
Zj(t)c'~^e^"* converges to Wi (because E[7?^] < oo and hence E[r/logr/] < oo) 
which are independent across i and Wi are distributed as per (4.12)-(4.13). From 
this it follows that as t — )• oo, 



(4 



( \ d A /■('^-l)^ 

17) P(<5(x)j = , where p(x) = / w{y)dy > 0. 

From the above discussion, it follows that as t — ^ oo 

d ^ 

(4.18) P( nti [2Zi{t) <Y,Zj{t)]) > (supp(x)) > 0. 



,=1 -•>o 
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4.4.4. Equivalence of boundary and tree processes. For regular trees, the frac- 
tion Zi{t)/{Yfj=i Zj{t)) and Ti{t)/{Yfj=i Tj{t)) converge to the same limit pri- 
marily because each of them converge to oo and Zi (t) and Tj (t) are related linearly. 
Such a relation does not exist for the random tree. However, with an additional, 
careful argument we shall establish the same fact for random trees as well under 
the event £{x) for x > 0. 

To that end, for 1 < i < d and n > 0, define 

(4.19) m;+i = + Ql+i(7?„+i - /x), 

where Q^^i is the probability that the n + 1 rumor infected node is added to the 
tree Tj(-), rjn+i is the number of children of this infected node, and fi = E[r]]. 
We define Mq = 0. It can be checked that is a martingale with respect to the 
filtration Tn, where Tn contains all the information about the part of the graph 
Q to which the rumor has spread including the rumor boundary. Now is a 
martingale with respect to Tn because, (a) Q^^i is a binary random variable with 
its probability of being 1 determined by Tn, and (b) r]n+i is independent of Tn and 
distributed as per V. Now \M^_^_^ — M}^\ < Tjn+i and r/n+i has a well defined mean 
and finite second moment. By the property of the martingale, therefore it follows 
that 

E[(M;)2]<nE[r?2]. 

Therefore, a straightforward application of Chebychev's inequality will lead to the 
following 'weak law of large numbers': 

(4.20) ^M;^E[M^]=0, in probability. 

Now under event £{x) (for any x > 0), Zi{t) scales like e°* for all 1 < i < d. 
Therefore, it can be checked that Ti{t) also scales like e"* (since Zi(t) represents 
the 'rate' at which Ti{t) is growing). Precisely, we have that 



B[Zi{t)] = 1 + E 



1 + AtE [Ti{t)] 



Now using Theorem 4.1, we have that 



1 
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Therefore, we see that the means of both Zi{t) and Tj(t) grow as e°*. 

Since d is finite, it further follows that n{t), the number of rumor infected nodes 
till time t, or equivalently Yli=i scales like e°*. Using these facts under event 
£{x) and an application of (4.20) with n replaced by n{t) (and taking t — )• oo or 
equivalently n{t) — )• oo), we obtain that for 1 < i < d as i — >• oo. 



(4-21) T^b^KmMO 



Tiit) 



Since under event £{x), Ti{t)/n{t) remains bounded away from as t — )• oo, from 
(4.21), it follows that 



(4-22) ^.Ml^t) ^ 0. 



1 

Tjt)- 



But 



(4.23) 



From (4.22) and (4.23), it follows that under event £{x), x > for 1 < i < d, as 

t — )• oo, 

(4.24) il^ E * - 

Now we are ready to conclude the proof of Theorem 3.2 by establishing that under 
the event £{x), x > 0, Zi{t)/{Yfj=i Zj{t)) and Ti{t)/{Yf-^i Tj{t)) converge to 
the same quantity. To that end, observe that for 1 < i < d. 



(4.25) 



_Ti{t) 7^ + 7^ ( EteT,(t) Vi 



nit) _d_j^J_( m 

n{t) + n{t) \ Z_^£'=l V^' 



Now as i — )• oo, under the event £{x), x > 0, the right most term in (4.25) goes 
to 1 since the numerator and denominator both go to ^ due to Ti{t),n{t) — oo 
and (4.24) (with its application to all the subtrees). This concludes that under event 
£{x), the ratio ^i(t)/(Ej=i ^j(O) and Ti{t) / {Yfj=iTj{t)) are asymptotically 
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equal as t — )• oo. Therefore, from (4.18) and the fact that the initial conditioned 
event P(?7o ^ 3) has strictly positive probability, it follows that 



This concludes the proof of Theorem 3.2. 

4.5. Proof of Theorem 3.3. To obtain the upper bound in Theorem 3.3 for 
limt^oo P {'^n{t)) assume that after time t at least k nodes have been infected 
in{t) > k), with the A:*'^ infected node being defined as Vk which has degree d. The 
number of children of t;^ is r/^ = d — 1 > 2. C'^(j) is the event that Vk is the rumor 
center after time t. To upper bound the probability of this event, we will use the 
memoryless property of the exponential spreading times crucially. There are d sub- 
trees neighboring Vk. We define the time when Vk is infected as < t. We define 
the size of the rumor boundary at tk as Z{tk) which consists of all uninfected nodes 
neighboring infected nodes. We have that Z(tk) = 1 + Yli=i iVi ~ 1)- Because of 
the memoryless property of the exponential spreading times and the way in which 
the random tree is constructed, at time tk each node in the rumor boundary is an in- 
dependent copy of identically distributed subtree random processes which we will 
refer to as Xj{t), for 1 < j < Z{tk) in the rumor boundary. 

We now use the notation from Section 4.1.4 for the subtree processes. Specifi- 
cally, let us imagine the node Vk as the global root and with respect to it, let Ti{t) 
denoted the size of the subtree rooted at Vk-i at time t. Let T^^(t) for i = 2, d 
be the other subtrees rooted at the children of Vk (which were not infected at time 
tk but were only part of the rumor boundary). Then, we have that T^{tk) = k-1 
and T^{tk) = for 2 < i < d. In T^{tk), there are Z{tk) - rjk nodes on the 
rumor boundary, each of which will be the source for i.i.d. subtree process Xj{t), 
1 < j < Z{tk) — rjk starting att = tk. Therefore, 




(4.26) 
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P fc^(i)) < P I nU{2Tl'{t) < 

d-l 

<P\T^{t)<Y,T,Ht) 

i=2 

<p|fc-i+ Yl Mt)< E 

'z{tk)-'nk z{tk) 

<p| x,(t)< 

«=1 j=Z{tk)-r,k+l 



By the condition of Theorem 3.3, we have that Z{tk) > {ck + l)r]k for some c > 1. 
Define X^t) = E ^.(i)- Then 

/L{cfe+i)j 

<P( E ^l(0<%fe+i)j+i(i) 
/L{cfc+i)J 

(4-27) <P n {^lW<%fc+i)J+i(0} 



< 



1 



[{ck + 1)J + 1 
(4.28) < i 

Above we have used the fact that for any fixed t > tk, there are in total [(cA; + 1)J + 
1 independent copies of the random variables X- (f) and the probability in equation 
(4.27) is the probability that one of these random variables is larger than rest, which 
is ^(^ck+i)\+i symmetry. The above bound is independent of t, so this estabUshes 
Theorem 3.3. 

4.6. Proof of Theorem 3.4: geometric trees. The proof of Theorem 3.4 uses 
the characterization of the rumor center provided by Proposition 1. That is, we 
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wish to show that for all n large enough, the probability of the event that the size 
of the d* rumor infected sub-trees of the source v* are essentially 'balanced' with 
high enough probability. To establish this, we shall use coarse estimations on the 
size of each of these sub-trees using the standard concentration property of renewal 
processes along with geometric growth. This will be unlike the proof for regular 
trees where we had to necessarily delve into very fine detailed probabilistic esti- 
mates of the size of the sub-trees to establish the result. This relatively easier proof 
for geometric trees (despite heterogeneity) brings out the fact that it is fundamen- 
tally much more difficult to analyze expanding trees than geometric structures as 
expanding trees do not yield to generic concentration based estimations as they 
necessarily have very high variances. 

To that end, we shall start by obtaining sharp estimations on the size of each 
of the rumor infected d* sub-trees of v* for any given time t. We are assuming 
here that the spreading times have distribution with CDF F with mean ^ > 
and exponential tail (precisely, if X is random variable with F as its CDF, then 
E[exp(0X)] < oo for 6* G (— e, e) for some e > 0). Initially, at time the source 
node V* has the rumor. It starts spreading along its d* children (neighbors). Let 
Ti{t) denote the size of the rumor infected subtree, denoted by Gi{t), rooted at 
the ith child (or neighbor) of node v* . Initially, Ti{{)) = 0. The Tj(-) is a renewal 
process with time-varying rate: the rate at time t depends on the 'boundary' of the 
tree as discussed earlier. Due to the balanced and geometric growth conditions as- 
sumed in Theorem 3.4, the following will be satisfied: for small enough e > (a) 
every node within a distance ^ (1 — e) of f * is in one of the Gi{t), and (b) no node 
beyond distance ^ (1 + e) of v* is in any of the Gi{t). Such a tight characteriza- 
tion of the 'shape' of Gi{t) along with the polynomial growth will provide sharp 
enough bound on Tj(t) that will result in establishing Theorem 3.4. This result is 
summarized below with its proof in Section 4.6.1. 

Proposition 1. Consider a geometric tree with parameters a > and < 
b < cas assumed in Theorem 3.4 and let the rumor spread from source v* starting 
at time as per the SI model with spreading time distribution whose cumulative 
density function is F such that the mean is fx andE[exp{0X)] < oo for 9 G (— e, e) 
for some e > where X is distributed as per F. Define e = t~^/'^~^^ for any small 
< 5 < 1/2. Let G{t) be the set of all rumor infected nodes in the tree at time t. 
Let Qt be the set of all sub-trees rooted at v* (rumor graphs) such that all nodes 
within distance ^(1 — e) from v* are in the tree and no node beyond distance 
^(1 + e)from v* is in the tree. Then 

Define £t as the event that Gt G Qt- Under event £t, consider the sizes of the 
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sub-trees Ti(t) for 1 < i < dy*. Due to the polynomial growth condition and £t, 
we obtain the following bounds on each Ti{t) for all 1 < z < 

Now bounding the summations by Riemann's integrals, we have 



I r'^dr<y^r'^ < / r'^dr. 
Jo Jo 



Therefore, it follows that under event £t, for all 1 < z < d^* 

_l_(l(i_,)_2)°"<r.w<^(i(i + .))"". 

In the most 'unbalanced' situation, dv* — 1 of these sub-trees have minimal size 
Tinin{t) and the remaining one sub-tree has size Tmiait) where 



b ft ^ 



r„.in(i) = :— -(l-6)-2 

1 + a \/i 



rmax(i) = :— -(1 + e 
1 + a \fi 

Since by assumption c < — 1), there exists 7 > such that (1 + 7)0 < 

6(du* — 1). Therefore, for any choice of e = ^-1/2+5 fQj. some 6 G (0, 1/2), we 
have 
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{d* - i)r^in(t) + 1 bid,* - 1 




a+1 



1 

Q + 1 



1 + a 
+ 



>1 + 7 
>1, 



for t large enough since as t — )• oo the first term in inequality (i) goes to 1 and the 
second term goes to 0. From this, it immediately follows that under event £t for t 
large enough 



max T,{t) < I I Vri(t) + 1 I . 



Therefore, by Proposition 1 it follows that the rumor center is unique and equals 
V*. We also have that £t C C'^(f)- Thus, from above and Theorem 1 we obtain 



This completes the proof of Theorem 3.4. 

4.6. 1. Proof of Proposition 1. We recall that Theorem 1 stated that for a rumor 
spreading for time t as per the SI model with a general distribution with mean 
spreading time ji the rumor graph on a geometric tree is full up to a distance ^ (1 — 
e) and does not extend beyond ^(1 + e), for e = t~i/2+<5 for some positive 5 G 
(0, 1/2). To establish this, we shall use the following well known concentration 
property of renewal processes. We provide its proof later for completeness. 
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Proposition 2. Consider a renewal process P{-) with holding times with 
mean /i and finite moment generating function in interval (— e, e) for some e > 0. 
Then for any i > and any 7 G (0, e') for a small enough e' > 0, there exists a 
positive constant c such that 



Pit) - - 



Now we use Proposition 2 to establish Proposition 1 . Recall that the spreading 
time along each edge is an independent and identically distributed random variable 
with mean /x. Now the underlying network graph is a tree. Therefore for any node 

V at distance r from source node v*, there is a unique path (of length r) connecting 

V and V*. Then, the spread of the rumor along this path can be thought of as a 
renewal process, say P{t), and node v is infected by time t if and only if P{t) > r. 
Therefore, from Proposition 2 it follows that for any node v that is at distance 
-(1 - e) for e = t^2+^ for some d £ (0, 1/2) (for all t large enough), 



P is not rumor infected) < 2e 



2e 8c ^ 



Now the number of such nodes at distance - (1— e) from v* is at most O 

(which follows from arguments similar to those in the proof of Theorem 3.4). 
Therefore, by an application of the union bound it follows that 

P fa node at distance — (1 — e) from v* isn't infected ) 

a+l \ 

e 8c 




Using similar argument and another application of Theorem 2, it can be argued that 
P (a node at distance i(l + e) from v* is infected) 

= 0(e 8c^ 

Since the rumor is a 'spreading' process, if all nodes at distance r from v* are 
infected, then so are all nodes at distance r' < r from v* ; if all nodes at distance r 
from V* are not infected then so are all nodes at distance r' > r from v*. Therefore, 
it follows that with probability 1 — O ^e^s^**^ , all nodes at distance up to ^(1 — 

e) from v* are infected and all nodes beyond distance ^(1 + e) from v* are not 
infected. This completes the proof of Proposition 1. 
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4.6.2. Proof of Proposition 2. We wish to provide bounds on the probability 
of < ;ut(l — 7) and P(t) > + 7) for a renewal process P(-) withholding 
times with mean /i and finite moment generating function. Define the v}^ arrival 
time Sn as 

n 
i=l 

where Xi are non-negative i.i.d. random variables with a well defined moment 
generating function Mx{6) = E[exp(0X)] < 00 for 6* E {—e,e) for some e > 
and mean E [X;] = > 0. We can relate the arrival times to the renewal process 
by the following relations: 

P {P{t) <n) = V{Sn> t) 

and 

P {P{t) > n) = P (5„ < t) 

The first relation says that the probability of less than n arrivals in time t is equal 
to the probability that the nth arrival happens after time t. The second relation says 
that the probability of more than n arrivals in time t is equal to the probability that 
the nth arrival happens before time t. 

We now bound P {Sn > t). To that end, for 9 G (0, e) it follows from the Cher- 
noff bound that 

P{Sn>t) = P (e^^" > e^*) 
< Mx {Of e-^* 

We can use the following approximation for Mx {0) which is valid for small 6, say 

6 G (0,e+) for < e+ < e. 

Mx {e) = l + dll + 02 +G^Y^ 0^-3^1^ 
< l + 6i/i + ci6l2 

for some finite positive constant ci. Using this along with the inequality log (1 + x) < 
for —1 < X, we obtain 

log (P {Sn >t))<n log {Mx {9)) - 9t 

< nlog(l + 6i^ + ci6'2) -9t 
<9{iin-t) + nci9'^ 
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To minimize this probability, we find the 9 that minimizes 9 [iin — t) + nci9'^. This 
happens for 9 = ^ET (n ~ set n = ^ (1 — 7), so the minimum value is 

achieved for 9* = 2cji-'y) - Therefore, there exists ei > so that for 7 e (0, ei), 
the corresponding 9* = 2ciji-'y) ^ quadratic approximation of 

Mx (9) is valid. Given this, we obtain 

< 
< 
< 

With this result, we obtain 

P ( P(i) < - (1 - 7) J < e , 

for any t and 7 G (0, ei). For the upper bound, we have for > 
P{Sn<t) = P (e-^^" > e-^*) 

We can use the following approximation for Mx {—9) which is valid for small 
enough 9 G (0, £~) with < £~ < £. 

Mx i-9) = l-9^^ + 02 _ ^3 ^ gi-s ^_^y-3 ^[^^ 

1=3 ^' 

<l-9lJL + 029"^ 

for some finite positive constant C2. Using this we obtain 

log (P {Sn < t)) < nlog {Mx {-9)) + 9t 

<n\og{l-9n + C29^) +9t 
<9{t- nn) + nc29'^ 

To minimize this probability, we find the 9 that minimizes 9 {t — pLn) + nc20^. 
This happens for^=2^(A*~|;) - We set n = ^ (1 + 7), so the minimum value 



7^2 



7Vt 



+ 



7Vt 



2ci(l-7) 4ci(l-7) 
4ci(l -7) 

tV 

8ci 



4cf(l-7r 
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is achieved for 6* = 2c2('i+7) • There exists, £2 > so that for all 7 G (0,e2), 
0* = 2^2(1+7) ^ ^~ ^'^d thus guaranteeing the validity of quadratic approximation 
of Mx{—0) that we have assumed. Subsequently, we obtain 

.o.(P < t)) < M . '^(1+7)^^1^ 



2c2(l + 7) 4c2(l + 7) 



< 



< 



7^* 



4c2(l + 7) 
7^* 



8C2 



With this result, we obtain 



P ( > -(1+7)) < e , 

for any t and 7 G (0, £2)- 

If we set c = max(ci,C2) and e' = min(ei,e2) and combine the upper and 
lower bounds then we obtain 



P 



> ^ 1 < 2e- 



for any t and 7 G (0, e') with e' > 0. This completes the proof of Proposition 2. 

5. Conclusion. Finding the source of a rumor in a network is an important 
and challenging problem. Here we characterized the performance of the rumor 
source estimator known as rumor centrality for generic tree graphs. Our analy- 
sis was based upon multi-type continuous time branching processes/generalized 
Polya's urn models. As an implication of this novel analysis method, we recovered 
all the previous results for regular trees [10] as a special case. We also showed 
that for rumor spreading on a random regular graph, the probability that the esti- 
mated source is more than k hops away from the true source decays exponentially 
in k. Additionally, we showed that for general random trees and hence for sparse 
random graphs like Erdos-Renyi graph, there is a strictly positive probability of 
correct rumor source detection. In summary, we have established the universality 
of rumor centrality as a source estimator across a variety of tree structured graphs 
and across variety of SI spreading time distributions. 
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